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Abstract 

FUN.STAT  is  a  name  proposed  by  us  to  describe  a  synthesis  of  statistical 
reasoning  which  combines  quantiles  and  quantile-densities,  information  and 
entropy,  and  functional  statistical  inference.  This  paper  describes  a 
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FUN.STAT  QUANTILE  APPROACH  TO 
TWO  SAMPLE  STATISTICAL  DATA  ANALYSIS 

Part  1.  Two  Sample  Problems  as  Data  Analysis  of  Stochastic  Process 
D(u),  0<u<j 

1.1  Univariate:  one  sample  problem 

1.2  Univariate:  two  sample  problem 

1.3  Representations  of  D(u) 

Part  2.  Functional  Statistical  Inference  Approach  to  Two  Sample  Problem 

2.1  Introduction 

2.2  Density  Estimation,  Kernels,  and  Windows 

2.3  Parametric  "Non-parametric"  Tests  based  on  Location 
Scale  Parameter  Models 

2.4  Autoregressive  estimation  of  d(u)  and  tests  of  HQ 

Part  3.  Asymptotic  Distributions  of  Stochastic  Processes  Arising  in  Two 
Sample  Quantile  Data  Analysis 

3.1  Introduction 

3.2  Conjectures  in  Distribution  of  D(u) 

3.3  Density  Estimation  and  Differential  Variance 


Part  4. 


Summary  of  Two  Sample  Quantile  Data  Analysis  Using  TWOSAM 


Part  1.  Two  Sample  Problems  as  Data  Analysis  of  Stochastic  Process  D(u),  0<ik1 


The  univariate:  one  sample  problem  of  statistical  data  analysis  considers 
a  random  sample  X^,...,Xn  of  a  continuous  random  variable  X  and  seeks  to  infer 
its  probability  law. 


In  the  quantile  approach  to  the  study  of  the  probability  law  of  a  random 
variable  X,  the  functions  to  be  estimated  are  [Parzen  (1979)]: 
distribution  function  F(x)  -  Pr[X  £  x],  -«><x<°°; 
quantile  function  Q(u)  =  F~^tu)  =  inf{x:  F(x)  >_  u}  ,  0<u^<l ; 
probability  density  function  f(x)  =  F*(x),  -<x><x<°°  ; 
quantile-density  q(u)  =  Q'(u),  0<u<l  ; 

density-quantile  fQ(u)  =  f(F-1(u))  =  (q ( u ) } “ 1  ,  0< u< 1 ; 

score-function  J ( u)  =  -(fQ)'(u)  =  -f'F  1(u)/fF  ^u),  0<u<l  . 

When  F  is  continuous,  FF  \u)  =  u.  When  F~^  is  continuous,  F  V(x)  =  x. 

We  call  X  (or  F)  bi-continuous  if  both  F  and  F"^  are  continuous  functions. 
When  F  is  bi-continuous,  then  F"^  is  a  true  inverse  in  the  sense  that 
F(x)  =  u  if  and  only  if  F”^(u)  =  x. 

To  estimate  a  function  [for  example,  D(u),  0 <u<l ]  three  types  of 
estimators  should  be  distinguished:  fully  parametric  [denoted  D  ( u ) ] ; 
fully  non-parametric  [denoted  D(u)];  and  functional  parametric  [denoted 
D(u)].  These  types  of  estimators  have  the  following  characteristics: 

(I)  fully  non-parametric  makes  almost  no  assumptions  about  a  model  for  the 
function;  (2)  fully  parametric  estimates  the  parameters  of  an  assumed 
finite-parametric  model  for  the  function;  and  (3)  functional  parametric 
estimates  the  parameters  of  an  approximating  parametric  model  whose  order 
is  determined  (selected)  by  the  data. 
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Fully  non- parametric  estimators  of  F(x)  and  F’^(u)  [given  a  random 
sample  Xj,...,Xp  with  order  statistics  )<- • -<x(n)  ]  are  defined  by  the 

sample  distribution  function  F(x)  and  the  same  quantile  function  F‘^(u). 

We  systematically  use  to  denote  a  sample  function  which  is  a  raw  (or  fully 

non-parametric)  estimator  of  a  function  defined  on  the  ensemble  or  population 
Definition.  Sample  distribution  F  and  sample  quantile  F  \  For  a 

sample  X, ,...,Xn,  F  and  F~^  are  piecewise  constant  functions  satisfying  for 

j=0,l , . . . ,n 


F(x)  =i  , 


x(j)  i  x  x  x(j+l) 


r'(u)  .  x(j). 


j-1 


<  u  <  — 


n  —  n 


where 


x(0)  ’  '*>  X(n+1 ) 


A  basic  question  of  the  univariate:  one  sample  theory  is  the  goodness 
of  fit  problem:  to  test  the  null  hypothesis  HQ :  F(x)  =  FQ(x),  where  FQ(x) 
is  a  specified  continuous  distribution  function.  The  mathematical 
statistician  is  concerned  with  finding  the  exact  and  asymptotic  distributions 
under  both  null  and  alternative  hypotheses  of  statistics  such  as 


D 


n 


sup 

-oo<  X<oo 


|F(x)  -  F0(x)| 


*  r  rHx)  -  r°M)Z  dFo(x) 


By  a  formal  change  of  variable  x  =  Qq(u)  =  F“ 1 ( u)  one  can  write 


°„'o<Sud  l"o<“>-“l 


Wn  '  fl  IK'™  -  u)2  du 


One  rigorously  obtains  these  formulas  by  interpreting  FF^(u)  as  the  sample 

distribution  function  Fy(u)  of  the  random  variables  U-|  =  F(X-j Up  =  FQ(Xn). 

The  null  hyootheses  H  is  equivalent  to  the  hypothesis  that  U  =  F  (X)  is 

0  o 

uniform  on  [0,1]. 

Alternative  to  testing  the  sample  distribution  function  for  uniformity, 
one  could  test  for  uniformity  the  sample  quantile  function  F”1 ( u)  =  Qy(u)  = 
For’(u). 

A  Brownian  Bridge  process  B(u),  0<u<l  is  a  zero  mean  Gaussian  process 
with  covariance  kernel 


EtBf^  )B(u2)]  =  min  (tij.Ug)  -  u^2 

O 

The  asymptotic  distribution  of  test  statistics  such  as  D  and  W  is 

n  n 

based  on  the  convergence  theorems  [as  sample  size  n  tends  to  <*>]:  assuming  Hq, 
Bp(u)  =  /n  { FF” 1 (u)  -  u}  S  Bp(u) 

B  ^(u)  =  </n  {F0F_1(u)  -  u}  S  -  Bp(u) 

where  5  denotes  convergence  in  distribution  of  stochastic  processes  and  Bp(u) 


is  a  limiting  process  distributed  as  a  Brownian  Bridge  process. 
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A  quick  and  dirty  definition  of  convergence  in  distribution  of  stochastic 


>rocesses  is  as  follows: 


{Bp(u),  0_< u<l }  +  {Bp ( u) ,  0 <_u<_1 } 

if  and  only  if  for  every  bounded  continuous  functional  g(x(u),  0j< u <1 )  on  a 
suitable  metric  space  of  functions  x(u),  0<_ u<l , 

E[g(Bp(u),  0<u<l)]  E[g(Bp(u) ,  0<u<l)]. 

Testing  for  uniformity  a  random  sample  U-|,...,Un  of  points  on  the  unit 

interval  is  a  canonical  problem  of  statistics  in  the  sense  that  many  other 

statistical  problems  can  be  transformed  to  this  problem.  One  way  to 

determine  the  appropriate  transformation  is  by  attempting  to  find  the 

2 

analogues  of  the  test  statistics  Dn  and  Wn-  To  develop  such  analogues,  one 
might  compare  computational  formulas. 

2 

Computational  formulas  for  Dp  and  Wn  [which  are  well  known  in  the  theory 

of  goodness  of  fit  tests,  see  Durbin  (1973)]  can  be  stated  in  terms  of  a  general 

distribution  function  D(u),  0<u<]  defined  in  terms  of  a  set  of  specified 

constants  IL,  where  0  =  UQ  <  U-j  <  . . .  <  Up  <  Un+-|  =  1;  define 

D(u)  =0  0  =  U  <  u  <-U,  ; 

o—l 


"j  i”'  UM 


un  i  u  '  Un+1  =  1 


=  nmax,  I D(u)-u|  =  ,max  -  U.,  U.  - 
0<u<l  1  '  '  1  l£j<n  'n  j  j 


M)  • 


<  -  ll  '»<“>  -  u)2  du  *  T?n2  +  n  l  <Uj  -  %T>  2 

0  * 


The  univariate:  two  sample  problem  of  statistical  data  anlaysis 
considers  random  samples  Xp...,Xm  and  Yj,...,Y  respectively  representing 
measurements  on  random  variables  X  and  Y  with  continuous  distribution 
functions  F(x)  and  G(x).  We  interpret  X  and  Y  as  measurements  of  a  physical 
variable  in  two  different  populations.  The  null  hypothesis  Hg:F(x)=G(x)  of 
equality  of  distributions  is  to  be  tested,  if  possible  without  specifying 
the  distributional  shapes.  This  paper  assumes  that  F  and  G  are  bi-continuous 
(that  is,  F,  G,  F~\  G"1  are  continuous  functions). 

The  random  sample  X,,...,X  has  sample  distribution  F  and  quantile  F-1. 
The  random  sample  Y-i,...,Y  has  sample  distribution  G  and  quantile  G~\ 

We  denote  by  H  the  sample  distribution  of  the  pooled  sample  X,,...,X  , 
Y1,...,Yn.  It  can  be  represented 

H(x)  =  XF ( x )  +  (1-X)  G(x) 

defining  N  =  nH-n  and  a  =  m/N.  The  limit  of  H(x)  is  H(x)  =  \F(x)  +  (1-x)  G(x). 
We  assume  that  as  N  tends  to  «,  X tends  to  a  limit  satisfying  0<X<1. 

Techniques  in  the  two-sample  problem  which  are  close  counterparts  of 
one  sample  techniques  are  the  statistics 

D  =  s“p  | F(x)  -  G(x) I 
mn  -oo<x<®  1  ' 

Wmn  =  ll  {^(x)  ‘  ^(x)}2  d”(x) 

Durbin  (1973)  states  computational  formulas  in  terms  of  the  ranks 

R.,  j=l,...,m,  in  the  pooled  sample  of  the  j-th  largest  observation  in  the 

J 

X-sample.  In  our  notation  these  formulas  become 
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By  comparing  these  formulas  with  the  general  computational  formulas  in  the 

2 

one  sample  case  one  sees  that  the  test  statistics  and  Wm„  are  related 
r  mn  mn 

to  the  statistics 

1  ma  x  i  n  /  \  i  1  f  1  r  r*  /  \  ^  2  , 

0<U<1  lD(u)'ul  •  J0  (D(u)-U)  du 

defining  D(u),  0<_u<l ,  as  follows: 


D(u)  =  0 

R1 

0  <  u  <  —  > 

=  i 

m 

!i<  t  <5aii  • 

N  N 

R 

=  1 

2|3 

1  A 

rt 

A 

mmJ 

• 

One  aim  of  this  paper  is:  (1)  to  relate  the  process  D(u)  to  the  processes 
F  and  H‘\  (2)  to  relate  D(u)  to  representations  of  linear  rank  statistics, 
(3)  to  relate  D(u)  to  quantiles,  and  (4)  to  use  D(u)  graphically  for  a 
complete  data  analysis  of  the  null  hypothesis  HQ. 
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1.3  Representations  of  D(u) 

We  have  defined  Rj  to  be  the  rank  in  the  pooled  sample  of  X^. y  the 
j-th  order  statistic  in  the  X  sample.  A  more  precise  definition  of  rank  is 
defined  in  terms  of  the  sample  distribution  H: 


^  =  NH(X(j)).  j=l , . . .  ,m 

Another  insight  into  the  definition  of  rank  is  provided  by  the  formula,  of 
later  use, 

H  1(u)  =  X(j)  •  <  u  -/  ; 

note  H~^(u)  equals  the  k-th  order  statistic  in  the  pooled  sample  for 


The  null  hypothesis  Hq:  F(x)  =  G(x)  is  often  tested  by  means  of  linear 
rank  statistics  of  the  form 

1  m  Ri 

tn  =  m  X  J(Nir) 

v)  * 

where  J(u)  is  a  suitable  weight  function  called  a  score  function.  Some 
frequently  suggested  score  functions  are  listed  in  Table  1A. 


Table  1A 

Score  functions  for  two-sample  linear  rank  statistics 
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Test  for  Location  Difference 

Test  for  Scale  Difference 

(u) 

Normal  scores  test 

|*_1(u)|2 

u  -  0.5 

Wilcoxon-Mann-Whitney  Test 

(u  -  0.5)2 

Mood  test 

Sign  (u-0.5) 

Sign  { | u  -  0.5  |  -  0.25} 

Median  test 

Quantile  test 

| u  -  0.5  |  -  0.25 

Ansari -Bradl ey  test 

To  study  asymptotically  the  distribution  of  T^  various  representations 
have  beer,  introduced.  The  celebrated  Chernoff-Savage  (1958)  representation 
of  a  linear  rank  statistic  T^  is 

TN  =  r  JN  (jjTT  H(x))  dF(x) 


The  Pyke-Shorack  (1968)  representation  is 


FH 


-1 


N 

(u)  dv  (u)  =  l 
N  i=l 


FH’1(H)  'Vi)'  -  V 


i-1 


)1 


for  a  suitable  signed  measure  v^.  Chernoff-Savage  establish  directly  a  limit 
theorem  for  T^,  while  Pyke-Shorack  derive  the  convergence  properties  of  TN 
from  the  convergence  properties  of  the  process  FH”^(u),  0<u<1  [see  Pyke  (1970)]. 

The  functional  statistical  inference  approach  proposed  in  this  paper  is 
based  on  the  proposition  that  FH”\  the  Pyke-Shorack  two-sample  process. 


may  be  appropriate  for  theorem  proving  but  is  not  directly  useable  for 
exploratory  data  analysis.  The  role  of  the  score  function  J(u)  is  preserved, 
and  functions  whose  graphs  are  suitable  for  data  analysis  are  obtained,  by 
using 


D^(u)  =  HF~^(u),  estimator  of  D^(u)  =  HF"^(u)  ; 

D  (t)  =  D'^t),  estimator  of  D  (t)  =  D^(t)  =  FH-1(t). 

Note  that  D~^(t)  is  a  right  continuous  function  which  is  the  inverse  of  the 
left  continuous  function  D^u);  it  is  defined  by 

D'^t)  =  sup  { u :  Dj(u)  £  t) 


Theorem  IB  shows  that  D(u)  is  computationally  exactly  the  same  as  the 

-  O 

process  D(u)  in  terms  of  which  we  approximately  expressed  Dmn  and  W^n. 
Theorem  1A  is  further  evidence  that  one  can  introduce  a  process  D(u)  such 
that  many  conventional  two  sample  statistics  are  functionals  of  this  process. 
The  univariate:  two  sample  problem  is  thus  transformed  to  a  problem  of 
statistical  inference  from  a  continuous  parameter  process  D(u),  0<u<l . 

We  call  this  a  problem  of  functional  statistical  inference. 


The  branch  of  statistical  theory  which  we  call  "functional  (statistical) 
inference"  (FUN.STAT)  is  a  branch  of  "abstract  inference"  [Grenander  (1981)]. 
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Theorem  1A:  Functional  Representations  of  Linear  Rank  Statistic 


i  m  R. 

TN  =  m  J%1^ 


fo  J(wr}  °{u))  du 


/o  j(ntt  *>  d5  (t)  • 


Proof:  Define  ( u)  =  u).  In  the  Chernoff-Savage  representatio 

=  /q  ^Mx))  dF(x)  make  the  change  of  variable  u  =  F(x)  to  obtain 

TN  =  fo  JN^T  (u) )du*  The  change  of  variable  t  =  D^u)  completes  the  proof. 


Theorem  IB:  Explicit  Formulas  for  D^u)  and  D(t). 

D^(u)  0£U<1 ,  is  piecewi se-constant,  non-decreasing,  left  continuous,  and 


satisfies 


R 

Dlu)  ’  Ji1<  uim  •  . ra- 


D  (t)  =  D^  (t)  is  piecewise-constant,  non-decreasing,  right  continuous,  and 
satisfies  D  (0)  =  0  , 


~  i\  •  • 

D  (^  )  =  ^  »  j  =  l , . . .  ,m 


More  precisely. 


D  (t)  =  0  ,  0  <  t  <  jjl  ; 


5  d)  =  i  _i  <  t  <  j— 

U  \ZJ  m  »  N  -  L  <  N  * 


j  =  1 , . . . ,  m- 1  ; 


D  (t)  =  1,  /  <  t  <  1, 


The  Pyke-Shorack  process  FH~  (u)  is  given  by 


FH-1(u) 


i  id.<  u  < 

m  ’  N  u  -  N 


FH_1(u) 


=  0  , 


0  <  u  <  -jj“ 


FH_1 (u) 


=  1  , 


<  u  <  1 . 


Suppose  m=2,  n=4,  Xj=2,  X2=4,  Y^l,  Y2=3,  Y3=5,  Y4=6 


R-j  =2,  R2=4. 


D|u)=g-,  0  <  u  ^ 


4  1  i 

6  ’  7  <  u  -  '  * 


6  (t)  =  0  ,  0  <  t  < 


- 1  t <  i 


FH"1 (u)  =0,  0  <  u  £  i 


=  \  >  g-  <  u  l  | 


The  statistic 


=  1  ♦  £  <  u  £  1 


,  m  R. 
d  =  1  V 

N  m  N+T 


is  the  Wilcoxon  statistic  (up  to  a  constant  multiple);  it  corresponds  to 
J(t)  =  t.  The  value  of  T^  is  3/7;  it  can  be  evaluated  by  the  defining 
sum  or  by  the  representations 


rl  6  : 


du'f  t|-f  =  f 


T  _  f  1  6  .  /  .  \  O  rc  I  i-.  6 

TN  -  /0  7  t  dD  U)  7  C6  ’  2  6  2^  ~  7 


6  rZ  1.4  1 


The  Pyke-Shorack  representation  would  be  evaluated 


2  {vN^  “  vnV}  +  1  {vN^  “  V^} 


if  one  bothered  to  discover  the  values  of  the  measure  vN  corresponding  to  J(u) 
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2.1  Introduction 

Part  1  has  attempted  to  show  that  most  conventional  test  statistics  in 
the  "univariate:  two  sample  problem"  are  functionals  of  a  stochastic  process 
D(u),  0<u_< 1 ,  and  proposes  that  the  problem  of  statistical  inference  should 
be  posed  as  follows:  what  can  we  learn  from  a  sample  path  of  the  stochastic 
process  D(u),  0 <u <1 ,  assuming  that  it  is  the  sum  of  a  signal  D(u)  =  FH ”^(u) 
and  a  noise  represented  Cg(u)//N: 

D(u)  =  D(u)  +  —  Cn(u) 

The  covariance  kernel  of  CD(u)  in  general  is  a  function  of  the 
following  unknown  functions  (which  it  is  our  goal  to  estimate) 

Df(u)  =  FH-1  (u) ,  0G(u)  =  GH_1(u), 

dF(u)  =  0F(u),  dQ(u)  =  Dg(u) . 

Note  that  DF(u)  =  D(u)  and  ADF(u)  +  (1-a)Dg(u)  =  u. 

Part  3  outlines  a  heuristic  proof  of  the  following  conjecture. 

Conjecture  2A.  Covariance  Kernel  of  Cp(u).  E[CD(u)  CD(v)]  equals,  for  u<v, 

(l-A)2  [i_1  dG(u)  dG(v)  0f(u)(1-Df(v)) 

+  (1-x)-1  dF(u)  dF(v)  Dg(u)  (1-Dg(v))] 


Under 


Distribution  of  C^(u)  under  HQ  and  local  alternatives  H-j 
Hq:  F  =  G,  Dp(u)  =  Dq(u)  =  u,  dp(u)  =  dg(u)  =  1,  Cq(u)  has  covariance 
kernel  (for  u<v) 

(1-x)2  [x_1  u(l-v)  +  (1-x)-1  u(l-v)]  =  (^)  u  (1-v) 

1  _x  u 

which  is  the  covariance  kernel  of  )  B(u).  When  H  is  true,  or  under 
alternative  hypotheses  under  which  HQ  is  only  "gently"  not  true  (as 
opposed  to  "violently"  not  true),  then  [S  denotes  equal  in  distribution] 

C„(u>  8  B(u) 

When  the  parameter  in  a  statistical  model  is  a  function  the  statistical 
inference  techniques  used  are  called  functional  statistical  inference.  By 
introducing  D(u),  0  <uj<  1 ,  the  two  sample  problem  has  been  formulated  as  a 
problem  of  functional  statistical  inference  (abbreviated  FUN.STAT)  in 
which  the  parameters  to  be  estimated  or  tested  are  the  function  D(u)  =  FH  ^(u), 
its  density  d(u)  =  D'(u),  and  its  Fourier  transform 

p ( v)  =  fl  e2niuv  dD(u)  =  Jo  e2lHuv  d(u)  du  ,  v=0,  +1  ,+2 . 

The  hypothesis  HQ  is  equivalent  to 


H  :  D(u)  =  u;  d(u)  =  1;  p(v)  =  0  for  v  f  0 
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To  understand  what  we  can  learn  from  d(u),  let  us  relate  it  to  the 
underlying  densities  f(x)  and  g(x);  the  derivative  of  D(u)  =  FH‘^(u)  equals 


d(u)  =  SUM. 
hH"  (u) 


Consequently,  the  reciprocal  d~  (u)  satisfies 


d-l(u)  =  hi lliA  =  xfH-](u)  +  n-x)  gH-%] 


fH~  (u) 


fH  '(u) 


fH" ' (u) 

Therefore:  d(u)  <_X_1 ;  d(u)  tends  to  0  if  f(x)  tends  to  0;  d(u)  tends  to  x-1 
if  g(x)  tends  to  0.  By  estimating  d~^(u),  one  can  estimate  the  likelihood 
ratio  g(x)/f(x)  without  estimating  g(x)  and  f(x)  separately. 

An  estimator  d(u)  of  d(u)  generates  an  estimator  of  D(u)  by 


D(u)  =  /“  d(t)  dt. 


To  form  an  estimator  d(u)  from  D(u)  it  is  often  convenient  to  introduce 
first  a  raw  estimator  of  p(v)  denoted  p(v). 

The  sample  pseudo-correlations  are  defined  by,  for  v  =  0,+l . 


_  fl  2iriuv 


p(v)  =  r0  e 


dD(u) 


1  m 

=  r  I  exp  2iri v(R ./N) 
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They  obey  the  model  (for  alternative  hypotheses  close  to  HQ) 
p(v)  -  p(v)  +  n(v)  ,  v=0,+l,... 


defining 


n(v)  =  e27Tiuv  dB(u)  ,  v=0,+l,...; 

one  can  show  that  n(v)  is  a  sequence  of  independent  N(0,1)  random  variables. 

A  Brownian  Bridge  B ( u)  can  be  represented  [see  Csorgo  and  Revesz  (1981)] 

B(u)  =1  n(v)  /“  e"2lr1vtdt 

Under  HQ,  |p(v)|2  is  asymptotically  distributed  as  a  sequence  of 

2Nx 

independent  random  variables  such  that  yy  |p(v)|2  is  chi  squared  distributed 
with  2  degrees  of  freedom.  A  95 %  significance  level  for  this  statistic  is  6. 

To  test  Hq  one  coul d 'examine  if  any  values  of  | p ( v ) | 2 ,  v=l,2,..., 
exceeds  a  threshold  such  as  3(l-x)/Nx. 

Natural  "portmanteau"  test  statistics  for  HQ  are  of  the  form 

I  I p ( v ) I  2  a  suitable  weight  function  k^(v).  The  optimal  choice  of 

v>l 

weights  k^(v)  depend  on  the  alternative  hypothesis  against  which  one  is 
testing  HQ.  If  one  makes  an  arbitrary  choice  of  weights  [such  as 
k^(v)  =  1/v  for  V>1],  then  there  will  always  exist  alternative  hypotheses 
against  which  the  test  statistics  has  efficiency  close  to  0.  If  one  always 

i 

uses  for  a  goodness  of  fit  test  the  statistic 


20  , 

l  |p(v)|2 
v-1 


one  will  too  often  accept  HQ  when  it  is  false  but  only  a  few  values  of  p(v) 
are  significantly  non-zero;  if  one  always  uses  the  statistic 


l  |p(v)|2 

V=1 


one  will  too  often  accept  Hq  when  it  is  false  because  p(v)  =  0  for 
v=l,...,4  but  is  non-zero  for  v>4.  To  achieve  an  "optimal  portmanteau” 


test  statistic,  one  might  consider 


l  |p(v) 
v=l 


where  the  order  M  is  determined  by  the  data.  Insight  into  how  to  choose  M 
is  provided  by  density  estimation. 
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2.2  Density  Estimation,  Kernels,  and  Windows 

Some  insight  into  the  problem  of  optimally  choosing  weights  k^(v)  can 
be  obtained  by  examining  the  density  estimation  problem  in  which  one  seeks  to 
optimally  choose  a  test  or  estimator  based  on  the  data.  Estimation  of  the 
density  d(u),  0<u<l ,  can  be  based  on  its  Fourier  series  representation: 

1/  \  r-  -2tti  UV  /  \ 
d(u) =  l  e  p(v) 

V  =  -oo 

A  raw  estimator  p(v)  generates  a  symbolic  raw  estimator 
d(u)  =.  I  e-2,,uv  c(»>  • 

V=-oo 

The  series  defining  d(u)  is  symbolic  because  it  does  not  converge.  A  natural 
class  of  estimators  d(u)  of  d(u)  are  of  the  form,  called  kernel  estimators, 

d(u)  =  l  kN(v)  e“27r1uv  p(v)  =  /J  ^(u-t)  dD(t) 

y””® 

defining 


KN(t)  =  f_e-2*itv  kN(v),  kN(v)  =  [°'S  e2llitv  KN(t)  dt. 

We  call  k  (v)  a  kernel,  and  K..(t}  a  window.  The  theoretical  investigation 
n  N 

of  these  estimators  in  the  context  of  the  two  sample  problem  is  still  very 
open  for  research  [see  conjecture  2B  below]. 

Example.  The  choice  of  weights 
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which  we  call  a  gap  or  leap  estimator,  or  a  numerical  derivative. 

Example.  The  density  estimator  d(u)  can  be  motivated  as  a  Bayes 
estimator  of  d(u)  given  the  data  p(*)»  Let 


d(u)  =  E[d(u) | p ( * ) ] ,  p(v)  =  E[P(v) |p( ■ )]• 


Then 

a(u)  ■=  l  e‘2’iuv  ;(v) 

V  =-0O 

The  prior  distribution  of  p(*)  is  that  p(v),  v=0,+l,...  is  a  sequence  of 
independent  zero  mean  normal  random  variables  with  variance  E | p ( v) | 2  =  e  C(v), 
where  e  is  a  scalar  parameter  and  C(v)  is  a  known  convergent  sequence. 

Under  local  alternatives  H-| ,  and  conditional  on  the  value  of  p(*)>  we  consider 
p(v),  v=l,2,...  to  be  independent  with  mean  p(v)  and  variance  C^= (1-x)/Na. 

Then  p(v)  =  k(v)p(v),  where 


k(v) 


Var  [p(v)]  =  8  C(v) 
Var[^(v) J  CN  +  eC(v 


*  {1  + 


e  C(v)} 


An  important  family  of  weights  of  this  form  is 


k(»)  -  a  *  (g)2rr’ 

where  one  has  to  choose  the  exponent  r  and  the  truncation  (or  half-power) 
point  M  adaptively  from  the  data.  This  choice  of  weights  can  also  be 
motivated  by  formulating  the  density  estimation  problem  as  an  optimization 
problem:  choose  d(u)  to  minimize 

l]  |d(u)  -  d(u)|2  +  p  l'0  |d(r)(u)|2  du 

where  p  is  a  penal ty  parameter  to  be  specified  by  the  researcher. 

In  the  general  context  of  functional  statistical  inference,  when  a 
kernel  estimator  is  motivated  by  an  optimization  problem  of  the  foregoing 
kind,  we  call  it  a  non-parametric  penalty  estimator.  A  density  estimator  is 
called  parametric  select  when  it  is  a  function  of  a  finite  number  of 
parameters  and  the  number  is  chosen  by  the  sample.  Autoregressive  density 
estimators  [described  in  section  2.4]  are  parametric  select  estimators. 

Conjecture  2B.  The  asymptotic  distribution  of  kernel  density 
estimators  can  be  developed  from  the  theory  [outlined  in  Part  3]  of 
functionals  fQ  J(u)  dD(u)  and  the  representation  d(u)  =  KN( u-t)  dD(t). 

Let  k(x)  -  <®<x<°°,  be  a  kernel  generating  function,  and  take  kN(v)  to  be 
of  the  form 

kN(v)  -  k(J) 

We  call  M  a  bandwidth  lag  or  truncation  point  or  half-power  point  [depending 
on  the  standardization  of  k( - ) ] ;  it  is  a  function  of  N,  and  tends  to  »  as  N 
tends  to  «.  Then  the  asymptotic  variance  of  d(u)  is  conjectured  to  satisfy 
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§  Var  [d(u) ]  =  K2  dp(u)  d£(u) 

where 

K2”  =  /"  K2(t)  dt  =  f°  k2(x)  dx, 

-00  “OO 

defining  K(t)  =  f  e“2irixt  k(x)  dx,  k(x)  =  /“  e2ltixt  K(t)  dt. 
-00 

Example.  The  kernel  generating  function 

k(x)  =  ?ux2~X~  ’  *+  0  • 

corresponds  to  the  window  generating  function 


correspond  to  a  leap  estimator  with  M=l/h.  The  foregoing  conjecture  for  the 
variance  of  a  kernel  estimator  agrees  with  the  formula  in  Part  3  for  the 
differential  variance  of  D(u). 
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2.3  Parametric  "Non-parametric"  Tests  based  on  Location  Scale  Parameter  Model 


A  fully  non-parametric  estimator  of  D(u)  =  FH-,(u)  is  provided  by  D(u). 
A  functionally  parametric  estimator  D(u)  is  provided  by  density  estimators 
d(u)  based  on  the  kernel  method  [section  2.2]  or  the  autoregressi ve  method 
[section  2.4].  A  fully  parametric  estimator  of  D(u)  is  based  on  estimating 
parameters  in  a  finite  parametric  model. 

A  frequently  adopted  parametric  model  for  the  distribution  functions 
F  and  G  is  the  location-scale  parameter  model 


x-m-i  x-y- 

F  ( x )  =  F  (- — -)  ,  G(x)  =  F(— -) 


where  Fq(x)  is  a  specified  distribution  function,  and  a-j  are 

unknown  parameters.  Equivalent  parametric  models  for  the  quantile  functions 


F  ^(u)  =  u-j  +  o-j  Qq(u)  ,  G  ”*(u)  -  v>2  +  °2  Vu) 
A  model  for  D-j(t)  =  HF  ^  (t)  is  easily  obtained: 


D-j  (t)  =  xt  +  (1-x)  GF~  (t) 


=  xt  +  (l-x)  FJ— — — - — ) 
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=  At  +  ( 1  - X )  F0(Q0(t)  -  e  -  *Q0(t))  , 


defining  parameters 


yru2  o1 

-0  =  — - —  ,  - - 1 

o  2  c?2 


Alternative  hypotheses  H,  which  are  local  to  Hq  correspond  to  assuming  e 
and  to  be  near  zero;  then  one  can  employ  a  linear  Taylor  series  expansion 


of  Fq(x)  about  x  =  Q0(t)  to  obtain 


0,(11)  «  t  -  ( 1 — X )  {e  f0Q0(t)  +  *  Q0(t)  f0Q0(t)J  . 

Our  goal  is  an  approximate  parametric  formula  for 

Conjecture  2C.  An  approximation  for  D(u),  valid  for  0  and  tp  near  0,  is 
D(u)  =  u  +  ( 1 -X )  0  f0Q0(u)  +  (J-A)  ^Qn(u)f0Q0(u) 


A  careful  derivation  of  this  approximation,  and  its  consequences,  is  given 
by  Prihoda  (1981).  By  substituting  a  parametric  formula  for  D(u)  in  the 
model  for  D(u)  under  local  alternatives  to  H  , 


D(u)  =  D(u)  +  (ijj A)  15  B(u) , 


one  can  obtain  estimators  0  and  ip  which  provide  parametric  estimators  of  D(u) 


The  model  for  D(u)  can  be  stated  as  a  regression  model  in  e  and 


(fry)  '  u}  =  9  Wu)  +  *  Qo^  Wu)  +  0'Ma(1-a)}‘^B(u) 

This  representation  is  similar  to  a  representation  used  by  Parzen  (1979)  in 
the  univariate:  one  sample  case  to  form  estimators  of  location  and  scale 
parameters  in  the  model  F(x)  =  Fq(-^-): 


f0Q0(u)  Q(u)  =  u  f0Q0(u)  +  o  Q0(u)  f0Q0(u)  +  B(u) 


Linear  Rank  statistics  f  J(u)  dD(u)  arise  naturally  in  expressions 
for  estimators  e  and  i/i.  The  variance  and  covariance  of  optimal  estimators 
0  and  ijj.in  the  regular  case,  are  given  by  inner  products  of  ^0Q0(U)  and 
Q0(u)  f0Q0(u)  in  the  Reproducing  Kernel  Hilbert  Space  (RKHS)  corresponding  to  the 
process  {Na(1-a)}“^  B(u).  One  can  use  the  data  D(u)-u  over  the  full 
interval  0<u<l ,  or  on  a  subinterval  0<p<u<q<l,  or  at  a  discrete  grid  of 
values  U| , . . .  »U|(  in  (0,1 ) . 

Asymptotically  efficient  estimators  e  and  ip  which  are  linear  functionals 
in  0(u)  are  obtained  by  applying  the  theory  of  regression  analysis  of 
continuous  parameter  time  series  developed  by  Parzen  (1961).  Introduce  the 
reproducing  kernel  Hilbert  Space  inner  product  between  functions  f(t)  and 
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g(t)  on  a  subinterval  p^tfq  of  the  unit  interval  corresponding  to  the 
covariance  kernel  K^.Ug)  =  min  (u^u^)  -  u-jU^  of  a  Brownian  Bridge  process 

<f,g>  =  /;  fit)  g'(t)  dt  ♦  l(£laM  + 

Digression.  We  find  interesting  an  alternate  expression: 

<^.g>  =  /q  f * ( t )  g' (t)  dt 


where 


f'(t)  =  f ' ( t)  P<t<q 

=  jjf(p),  0<t<p 

=  ’(Tq  q±t<i 

To  form  the  inner  product  of  f(t)  and  g(t)  over  0<t<l  we  require 
f (0)  =  f(l)  =  g ( 0)  =  g(l )  =  0;  then 

<f,g>  =  f]Q  f'(t)g‘(t)  dt. 

Note  that  (f0Q0)'(u)  =  -JQ(u),  { Qq ( u )  f0Q0(u)}'  =  1  -  Jq(u)Q0(u).  To 

A  A 

form  the  estimators  e  and  ^  we  form:  the  information  matrix 


and  the  statistics 


T1  =  <W  {D(u)-u}> 

T2  =  <Qo'foQo’  {D(u)-u}> 


Then 


0 

1 

l—H 

1 

II 

T! 

1 - 

■6-  > 

1 _ 

1-x 

- 1 

(XI 

t— 

_ 1 

In  the  symmetric  case  fQQ0(u)  =  foQo0-u),  and  Qq(u)  =  -Qq(1-u).  Then 
I12  =  0  when  q  =  1-p. 

Let  us  explicitly  evaluate  the  inner  products  when  we  use  the  interval 
0<u<l  and  all  the  data  D(u),  (ku_< 1 .  We  must  assume  that 

/qJ0(u)  du  =  {l-J0(u)Q0(u) }  du  =  0.  Then 

Vl  =  /ol  Jo(u)  I2du>  l22  =  du* 


r,  =  rQ  J0(u)  do(u) 


t2  c  Jo  "  -  Jo(u>  Qo(u)1  d“(u)  • 


The  covariance  matrix  of  e  and  ip  is  given  in  general  by 


Var  (e)  Cov  ( 0 ,41 ) 


Cov  Var  (41) 


1  ,-1 


Nx( 1-xT 


To  illustrate  the  meaning  of  the  foregoing  formula  for  variance, 
consider  e  =  t*ie  normal  case.  Assume  that  o-|  =  =  o. 

«  a 

Let  Vi  be  the  sample  means  and  a2  the  variance  of  the  pooled  sample. 

J 

Then  bas  asymptotic  variance  equal  to  m  +  n  =  N/mn  = 

{Nx(l-x)}-^  I“] ,  since  JQ(u)  =  4>~^(u)  in  the  normal  case. 

To  test  Hq:  F=G  the  analogue  of  conventional  "non-parametric"  test 
statistics  (which  could  be  called  a  parametric  "non-parametric"  test 
statistic  for  HQ)  is  the  quadratic  form  [where  *  denotes  transpose] 


0 

★ 

Var  [0] 

Cov  [0,^] 

-1 

0 

<l> 

Cov  [e,<j>] 

Var  [*] 

<P 

which,  under  the  null  hypothesis  Hq,  has  a  Chi-squared  distribution  with 
two  degrees  of  freedom. 


In  terms  of  the  test  statistics  T-j  and  one  can  write 


Example:  The  logi stic  distribution  has  standard  quantile  function 
and  score  function 


Q0(u)  =  log  ,  JQ(u)  =  2u-l 


Therefore  [see  Eubank  (1979)] 


In  =  />  (2u-l)2  du  *  j  ,  I. 


3  +  TT2 


A  non- parametric  test  statistic  for  location  [which  is  optimum  for  the 
logistic  distribution]  is 

.  _  NX  lTll2  12Nx  ,1  1,  m  ,  2 

Li  -  T77  -  rra  /» (u  -  7>  dD(u) 


which  is  asymptotically  equivalent  to  the  Wilcoxon  statistic.  T  R..  It 

3 

equivalent  in  a  finite  sample  if  we  define  D(u)  to  be  piecewise  constant 
and  equal  to  j/m  at  u  =  R^/(n+l);  then 

J 


1  -  1 


30 


s 


[  - 


t 


l 

l-  - 

L  * 

l-' 


!• 


A  non- parametric  test  statistic  for  scale  [which  is  optimum  for  the 
logistic  distribution]  is 


U  = 


T, 

/ 


1  -X  I 


NX 


22 


36  Nx 


m  R . 


m  Wl 

J  * 


2^  109  ^N+l-R.^ 

J 


This  test  may  have  been  given  first  by  Prihoda  (1981). 

Motivation  of  entropy  or  information  measures  as  "portmanteau"  test  statistics 
Parametric  "non- parametric"  tests  of  HQ  given  by  L  may  be  most  powerful  when 
one  is  testing  alternative  hypotheses  which  correspond  to  shifts  in  location 
and  scale  parameters.  To  obtain  general  "portmanteau"  procedures,  which  do 
not  require  close  speci fications  of  the  alternative  hypothesis,  let  us 
re-express  the  statistic  L  in  terms  of  the  estimated  density 


d(u)  =  1  +  ( 1 -x )  [e(f0Q0  (u)) '  +  *  (Qo(u)f0Q0(u))*D 


One  may  verify  that 


il 


d(u)  -  1 


du 


"V 

* 

-1 

’  V 

= 

I 

T2 

- 1 

C\J 

i— 

_ 1 

=  L{ (l-x)/Nx) 


One  is  led  to  conjecture  that 


py/’  [d(u)-t]2  du 
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is  a  test  statistic  for  Hq  whose  null  distribution  is  chi-squared  with 
degrees  of  freedom  equal  to  the  number  of  parameters  in  d(u).  Next  one  is 
led  to  conjecture  that  the  entropy  or  information  measure 

yrf-  /J  -  log  d(u)  du 

is  a  test  statistic  for  HQ  whose  null  distribution  is  chi-squared  with 
degrees  of  freedom  equal  to  the  number  of  parameters  in  d(u).  We  next 
introduce  autoregressive  estimators  d(u)  for  which  /  log  d(u)  du  is 
evaluated  as  a  parameter  without  integrating  an  estimated  density. 


.  _s. . 


where  the  complex-val  ued  autoregressive  coefficients  am(m)  are  computed  by 
the  Yule-Walker  equations  described  below.  Further 

H(<V  =  /o  ‘  109  dm(u)  du  =  '  109  Si 

A 

is  directly  computed  in  terms  of  the  parameter  Km. 


The  Yule-Walker  equations [which  are  solved  to  obtain  j - 1 , -  -  - ,  m 

and  Km  from  p(v),  v=0,  +1,  . ...  +  m]  are  [see  Parzen  (1979)] 


l  a  (j)  p(j-k)  =  0  ,  k=l . m, 

j=0  m 


Tn  “m(j)  p(J)  *  ^ 
J=0 


where  am(0)  =  1.  They  are  solved  using  the  recursive  algorithm 


m-1  „ 


a  (m)  =  -  - -  I  am  p(j-m) 

Vl  j=o 


and  for  j=l , . . . ,m-l 


“m(j)  =  Vl(j)  +  “m(m)  Vl  (m-J) 


where  a*  is  the  complex  conjugate  of  a.  The  autoregressive  method  of 

estimating  densities  was  first  implemented  in  a  computer  program,  and  its 

theory  investigated,  by  Carmichael  (1976,  1978). 

A  proposed  diagnostic  for  determining  the  order  m  of  an  autoregressive 

estimator  d  (u)  is  the  plot  of 
m 


D*(u)  =  ^  d5(u) 


An  intuitive  criterion  for  choosing  an  optimal  order  m  is  the 

smallest  value  of  m  for  which  5m(u),  0<u<l ,  is  not  si gni ficiantly  different 

from  D_ ( u )  =  u,  0<u<l . 
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This  paper  aims  to  raise  the  consciousness  of  statisticians  about  the 
FUN.STAT  (functional  statistical  inference)  approach  to  the  two  sample 
problem.  At  this  time  we  can  only  state  conjectures  about  the  theorems 
that  need  to  be  proved  [theoretically  and/or  by  Monte  Carlo  calculations]. 

One  theorem  is  about  the  large-sample  properties  of  autoregressive  density 
estimators;  another  theorem  is  about  the  use  of  estimates  -log  1^  [of 
entropy  or  information  measures]  to  test  Hq  and  to  form  order  determining 
criteria  for  optimal  autoregressive  orders  m.  A  noteworthy  irony  is  that 
the  orders  m  chosen  in  practice  are  small,  and  one  might  wonder  about  the 
relevance  of  a  large  sample  consistency  theorem. 

Conjecture  2D.  Formula  for  the  asymptotic  variance  of  dm(u)  as  an 
estimator  of  d(u):  As  m  tends  to  ®  at  a  suitable  rate  [such  as  m3/N  -+•  0 
as  N  -*-  °°]  dm(u)  tends  in  probability  to  d(u)  and  its  asymptotic  variance  satisfies 

sVar[dra(u)]  =  2YdF(u)  dG(u)  . 

(Note  that  m  denotes  the  autoregressive  order  and  not  the  size  of  the  X 
sample.)  This  conjecture  is  based  on  the  formulas  for  the  variance  of  the 
kernel  estimators  conjectured  in  section  2.2,  and  the  relations  between  the 
distributions  of  kernel  and  autoregressive  estimators  of  the  spectral 
density  function  of  a  stationary  time  series  [conjectured  by  Parzen  (1969), 
and  confirmed  by  Berk  (1974)]. 

Conjecture  2C.  A  "portmanteau"  (alternative  hypotheses  unspecified) 
procedure  for  testing  Hq  [which  may  have  optimal  properties]  is  of  the 
form:  accept  HQ  if 


It  should  be  emphasized  that  further  theoretical  and  Monte  Carlo 
investigation  is  required  to  find  the  best  multiple  of  the  right  hand 
side  to  use  in  practice  [perhaps  including  a  factor  of  log  log  m]. 

Conjecture  2F.  A  procedure  for  autoregressive  density  estimator  order 
determination.  If  one  rejects  HQ  because  one  of  the  inequalities  in 
Conjecture  2E  is  violated,  let  m  be  the  value  of  m  minimizing  a  criterion 
of  the  form 

AIC(m)  =  log  Km  + 

A  /V 

An  estimator  of  d(u)  is  taken  to  be  d~(u).  Note  that  AIC(m)<0. 

Conjecture  2G.  Can  one  develop  criteria  for  accepting  Ho  based  on 
the  values  of  (v~)  | p( v)  | c  ,  v=l,2,...,  .  Under  H  these  statistics  are 
asymptotically  independent  Chi-squared  distributed  with  2  degrees  of  freedom. 


Part  3.  Asymptotic  Distributions  of  Stochastic  Processes  Arising  in  Two 


Sample  Quantile  Data  Analysis 
3.1  Introduction 

The  FUN.STAT  approach  to  testing  equality  of  two  independent  populations 
with  bi-continuous  distributions  F  and  G  respectively  is  based  on: 

(1)  estimating  parameters  which  are  functions  such  as  D^(u)  =  HF~^(u),  and 
D(u)  =  FH_1(u),  and 

(2)  exploratory  data  analysis  of  the  fully  non-pa rametric  estimators 
D]  (u)  =  HF-1  (u)  and  6(u)  =  D^(u). 

This  part  discusses  how  to  derive  the  asymptotic  distribution  of  the 
stochastic  processes  D^(u),  0_<u<l ,  and  D(u),  0_< u<T .  Our  aim  is  to  outline 
an  operational  calculus  for  intuitively  deriving  results  concerning  the 
distribution  of  empirical  processes  [such  as  D(u)],  and  for  identifying 
stochastic  processes  CD(u)  and  CQ  (u)  such  that 

M  {D(u)  -  D(u) }  °  CD(u),  M  (D]  (u)  -  D-j  (u)>  +  CQ  (u)  , 

where  5  connotes  convergence  in  distribution  of  stochastic  processes. 

Our  results  are  heuristic  theorems,  rather  than  rigorously  proved  theorems 
with  carefully  stated  regularity  assumptions. 

Theorem  3A.  Asymptotic  distribution  of  a  sample  distribution  function 
F(x).  F ( x )  of  a  random  sample  Xj,...,X  can  be  expressed  in  terms  of 
F^u)  of  =  F(X-j ),. . .  ,Um  =  F ( Xm)  which  are  uniform  on  [0.1].  Note  that 
FF“^(u)  =  F^(u).  One  can  show  that  there  is  a  Brownian  Bridge  process 
B(u),  0<u<l ,  such  that 
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m^{F|j(u)  -  u)  +  8(u) 


We  denote  by  Bp(u)  a  Brownian  Bridge  process  such  that 


Bp(u)  =  m^  (FF"^(u)  -  ul  J?  Bp ( u )  , 
Next  define 

Cp(x)  =  m^{F(x)  -  F(x))  -?  Bp ( F ( x ) ) 


The  limiting  process  of  F(x)  is  denoted  Cp(x),  where  Cp(x)  -  Bp ( F ( x) ) . 

Theorem  3B.  Asymptotic  distribution  of  sample  quantile  function  F  ^ (u) . 
Fj1 (u)  =  FF-1 (u)  satisfies 


B  (u)  =  M  {FF"T  (u)  -  u }  °  -  CF  (F_1(u))=  -Bp(  u) 


F-1(u)  under  suitable  conditions  on  fF  ^ (u)  [see  Csorqo  and  Revesz  (1981)] 
sati sfies 

M  {F-1(u)  -  F_1(u)l  l  -  Br(u) 

FF-  (u)  F 


The  limiting  process  of  F  ^(u) 
relation 


i  s  denoted  C  (u) . 
F 


We  note  the  basic 


C^__i  ( u)  -  (-1)  {^jj  F  Cp(F  \u)) 


Proof:  Write  FF'’(u)  -  u  >  Fr’(u)  -  Ff’(u)  ♦  FF'Vu)  -  u.  One  may 
verify  that 
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Cf(F_1(u))  2  Cp ( F- 1 (u) )  =  Bp ( u )  . 

'-1  -] 

The  first  conclusion  may  now  be  inferred.  Next  m2{F  (u)  -  F  (u)}  equals 

{F  1  (u)  -  F~_\(u) - (FF-1  (u)  -  u)} 

FF"1 (u)  -  FF'1  (u) 

The  left  bracket  contains  the  reciprocal  of  a  difference  quotient  which 
tends  to  f(x)  evaluated  at  x  =  F"1  (u).  The  right  bracket  converges  to  -B 

3 . 2  Conjectures  in  Distribution  of  D(u) 

To  apply  Theorems  3A  and  3B  to  the  two-sample  problem  we  first  derive 
heuristically  the  asymptotic  distribution  of  GF_1(u)  as  an  estimator  of 
GF'1 (u): 


SX<1  i^‘1(u)  - u'  4 


M  {FF_1(u)  -  FF"1 (u) } 


Ai  {GF-1 (u)  -  GF-1 (u) } 

-  M  {GG1 (GF1 (u) )  -  GF-1 (u)  +  GF_1(u)  -  GF- 1 (u) } 

=  (l-x)"55  Bq  (GF1  (u) ) 

+  >-■«  r’-tEL'JiyjJL- AF-Jluj  g  (u) 

FF"  (u)  -  u  F'1 

£  Bq  (GF-1  ( u) )  -  GF'1  (u)}  Bp  (u) 


(u). 
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The  asymptotic  distribution  of 


D(u)  =  HF"  (u)  =  xFF”  (u)  +  (1-A)  GF~  (u) 


as  an  estimator  of  D(u)  =  HF  ^(u)  =  Xu  +  (1-x)  GF  ^(u)  is  described  by  the 

^  ~~-l  -1 

asymptotic  distribution  of  HF  (u)  -  HF  (u). 


Conjecture  3C. 


/IT  (HF-1  (u)  -  HF-1  (u) )  and 


/!T(1-a)  (GF  V)  -  GF  \u))  converge  in  distribution  to 


Cn(u)  =  (1-x)  [(1-A)-1*  EL(GF_1(u))  -  A-5*  B  (u)  ] 


fF“  (u) 


Let  dj( u )  =  D-j  ( u )  =  hF'^uJ/fF'^u).  Then  d^D^u))  =  hH_1  (u)/fH_1  (u) . 
The  asymptotic  distribution  of  D  (u)  =  (HF  )  (u)  as  an  estimator  of 

D  (u)  =  (HF"^)”^(u)  =  FH~^(u)  is  conjectured  (using  proofs  similar  to  those 
used  for  sample  quantile  functions)  to  satisfy  the  following  theorem. 


Conjecture  3D. 


Asymptotic  distribution  of  D  (u). 


/N  (D  (u)  -  D  (u)}  +  Cn  (u) 


where  Ch(u)  =  - — Cn  (D“ 1  ( u) )  is  explicitly  given 

U  d-j (D^1  (u) )  D1  1 


Cn(u)  =  (1-x)  [(1-x)' 


Bn(GH~  (u) )  -  A~*  Bc(FH~  (u) )] 
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The  covariance  kernel  of  Cp(u),  given  in  Conjecture  2A,  shows  that 
the  complex-looking  process  Cp(u)  actually  has  much  simplifying  symmetry. 
It  has  been  previously  derived  in  Pyke-Shorack  (1968)  who  show 

Af  {FH_1  (u)  -  FH_1(u)}  5.  Cu(u) 


An  interesting  question  is  whether  the  asymptotic  distribution  of  D(u) 
can  be  deduced  from  the  Pyke-Shorack  results  using  the  fact  [Theorem  IB]  that 
D(u)  -  FH'^(u)  equal  0  except  for  about  m  sub- intervals  of  length  1/N 
in  which  it  equals  1/m. 

Distribution  of  stochastic  Stielt.ies  integrals  and  linear  rank 


statistics.  The  process  D(u)  has  the  important  property  that  a  linear  rank 
statistic  can  be  asymptotically  represented  as  a  stochastic  Stieltjes 
integral 


/  J(u)  dD(u) 


for  a  suitable  continuous  score  function  J(u).  Its  limiting  distribute 
can  be  described  as  follows: 


a(J)  =  Ai  {f'Q  J(u)  dD(u)  -  J]  J(u)  dD(u)} 


is  asymptotically  normal  with  zero  mean  and  covariance  kernel 
leuristical ly  representing 


and  given  by 


Ka^JTJ2^  =  /l  J2^u2^  dEtCD^ui)  Cd^U2^ 

Expl icitly 

J i  * ^ 2 ^  =  ^■|(d']»d2^  +  ^3^1  ,l^2^  ”  ^2^1  '^2* 

where 

Kjt^.Jg)  =  ^  ll  Ji(u)  J2^u^  dG^u^  dp(u)  du  ; 

2 

K2(Jn , J2)  =  -  ~-X-)-  /q  J]  (u)  {dg(u)Dp(u) } '  du  /g  J2(u)  {dG(u)Dp( u) } '  du 
+  0-a)/J  0] ( u)  {dp( u)Dq( u) } 1  du  /J  J2(u){dp(u)DG(u)}'  du  ; 

,d2^  =  Iq  ^0  dul  ^u2  di  ( u-|  )d2^u2 ^ 

[{X-1  dg  (u^dg  (u2)Dfr(min(u1  ,u2))  +  (l-x)_1dp  (u1  )dp- (u2)Dg(min(u1  ,u2 

+  eC^-Ug)  {A_1dg  (u1  )dg(u2)dp(u2)  +  (l-x)_1dp  (u1  )dp(u2)dg(u2)} 

+  e(u2-u1){x'1dg  (u2)dG(u1)dF(u1)  +  (l-x)-1dp  (u2)dp(u1 )dG(u^ )}] 

where  e(t)  =1  or  0  as  t>0  or  t<0.  Under  the  null  hypothesis  HQ 
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KA(JrJ2)  =  (^H/J  J](u)J2(u)  du  -  !]q  Jl(u)  du  /J  J2(u)  du}  . 

One  can  obtain  the  asymptotic  covariance  of  p ( v-j )  and  p(v2)  by  choosing 

,  /  v  2Triuvi  ,  /  >  -27riuv2 
J^u)  =  e  1  ,  J2(u)  =  e  2  . 

3.3  Density  Estimation  and  Differential  Variance 

Insight  into  the  asymptotic  variances  of  density  estimators  is  provided 
by  a  formula  for  the  variance  of  the  fully  non- parametric  estimator  of 
d(u)  -  D'(u)  given  by  the  numerical  derivative 

6  (u+h)  -  6  (u-h) 


Conjecture  3E.  A  formula  for  the  asymptotic  variance  of  the  numerical 
derivative  d(u)  is 

2hN  Var  [d  (u)]  =  ^  dp(u)  dQ(u) 

The  expression  on  the  right  hand  side  is  called  the  differential  variance 
of  D(u);  it  can  be  used  to  suggest  conjectures  concerning  the  asymptotic 
distributions  of  kernel  and  autoregressive  density  estimators  [Conjectures  2B 
and  2D].  The  form  of  the  differential  variance  suggests  that  d(u)  has  the 
distributional  properties  of  a  density-quantile  estimator  since  an  estimator 
of  a  probability  density  d(u)  has  variance  proportional  to  d(u),  while  the 
variance  of  a  density-quantile  d(u)  has  variance  proportional  to  d  (u). 


Conjecture  3F.  A  fully  non- parametric  estimator  of  d-j(u)  = 
x  +  ( 1 - X )  {gF  l(u)  fF”^(u)}  given  by 

D, (u+h)  -  (6, (u-h) 
dl‘“>  ■  J - 2h  - 


has  asymptotic  variance  satisfying 

2hN  Var[cjy(u)]  =  ~  GF_1(U)} 


HF-’iu)).  hi.  d)(u 


Outline  of  a  heuristic  proof  of  Conjecture  3E.  2hN  Var  [d( u ) ] 
approximately  equals 


2h  “  G^(u-h)| 

2 

*  [(T-x)_1dp(u)  E|8G(GH''(u+h))  -  B^GH"'  (u-h))|2 

♦  X-'  d|(u)  E f 6p  (FH’’(u+h))-  Bp ( FH" 1  ( u+h ))| 2] 

=  [x  dp(u)  dG(u)  +  (1-x)  d|(u)  dpi.)] 

"  dF(u)  dG(u) 


since  xdc(u)  +  (1-x)  dr(u)  =  1. 


of  density  estimators.  A  general  theory  of  asymptotic  distribution  of 
density  estimators  can  be  developed  by  assuming  that  K  (J^,J2)  car*  be 
represented 

KA(J1’J2)  =  /o  Jl(u)  J2(u)  Vl(u)  du 

+  /q  /J  J-j (u)  J2(u)  V2(u1#u2)  du1du2 

where  V-j(u)  and  V2^ul’u2^  are  integrable  functions.  We  call  Vj(u)  the 
differential  variance;  V2(u^,u2)  vanishes  in  formulas  for  the  asymptotic 
variance  of  kernel  and  autoregressive  density  estimators.  Spectral  averages 
of  the  spectral  density  of  a  stationary  time  series  which  is  a  linear 
process  have  the  foregoing  structure  [see  Parzen  (1961),  p.  982]. 


Part  4.  Summary  of  Two  Sample  Quantile  Data  Analysis  Using  TWOSAM 

To  test  the  null  hypothesis  HQ:  F(x)  =  G(x)  of  equality  of  two  populations, 
statisticians  usually  choose  a  test  statistic  (TN,  Dmn,  W*  ,  etc.),  compute 
its  value  from  the  data,  and  test  the  significance  of  the  computed  value  of 
the  test  statistic  chosen.  This  paper  shows  that  conventional  test 
statistics  can  be  represented  as  functionals  of  the  process  D(u),  O^ud,  and 
proposes  an  autoregressi ve  density  estimation  approach  to  the  data  analysis 
of  D(u). 

In  addition  to  providing  the  applied  statistician  with  the  ability 
to  analyze  sample  paths  of  continuous  parameter  stochastic  processes 
[such  as  D(u),  0<u<l],  this  paper  aims  to  stimulate  the  applied  statistician 
to  appreciate  the  basic  probability  theory  of  these  stochastic  processes. 

A  graphical  (rather  than  an  arithmetical) way  to  test  Hq  is  to  plot 
D(u),  0  <_  u  <_  1 ,  and  examine  whether  its  deviation  from  the  uniform 
distribution  DQ(u)  =  u  appears  to  be  significantly  different  from  the  sample 
path  of  a  Brownian  Bridge  with  variance  (1-a)/AN. 

The  proposed  quantile  data  analysis  approach  to  the  univariate:  two 
sample  problem  involves  several  stages. 

Stage  1 .  Fully  non-parametric  analysis.  Obtain  for  each  of  the  two  samples, 
and  for  the  pooled  sample,  descriptive  statistics  and  plots  of  the 
informative  quantile  function.  Plot  on  one  graph  the  quantile  functions  of 
the  two  samples.  Plot  D(u). 

Stage  2.  Autoregressive  analysis.  Obtain:  |p(v)|  ,  square  modulus  of 
sample  pseudo-correlations,  for  v=l,...,M  where  M  is  a  specified  maximum 


order.  Plot  dm(u)  and  Dm(u)  for  m=l,...,M.  List  values  of  Km,  log  Km, 
and  AIC(m)  for  m=l,...,M.  Obtain  optimal  order  by  AIC  criterion. 

The  two  sample  non-parametric  data  modeling  procedures  described  in 
this  paper  have  been  implemented  in  a  computer  program  called  TWOSAM.  I 
would  like  to  acknowledge  the  contributions  to  this  program  of  the 
following  colleagues  during  the  course  of  their  Ph.D.  studies:  Jean-Pierre 
Carmichael,  Mike  White,  Tom  Prihoda,  Scott  Anderson,  Phil  Spector,  and  Avi 
Harpaz  (who  deserves  special  thanks  for  the  current  version  of  the  program). 

To  illustrate  how  the  quantile  approach  to  data  analysis  could  be 
presented  to  students  in  an  introductory  statistics  class,  we  consider  a 

data  set  analyzed  by  Larsen  and  Marx  (1981),  p.  324. 

An  important  problem  of  two  sample  data  analysis  arises  in  cases  of 
disputed  authorship.  Were  the  10  essays  published  in  1861  by  "Quintus 
Curtius  Snodgrass"  actually  written  by  Mark  Twain?  Let  X  and  Y  respectively 
denote  the  proportion  of  three-letter  words  in  (eight)  Twain  essays  and 
(ten)  Snodgrass  essays.  For  ease  of  writing,  the  sample  values  X^,...,Xg, 
Y1,,”,Y10  are  mu^iplie^  by  1000  and  200  is  subtracted.  The  samples  then 
have  order  statistics: 

X:  17,  17,  25,  29,  30,  35,  40,  62 

Y:  -4,  1,  2,  5,  7,  9,  10,  20,  23,  24. 

A  typical  data  analysis  might  include  the  following  diagnostics. 

I.  An  analysis  of  the  two  samples  based  on  the  t-test  yields  a  t-value 
of  3.86  and  rejects  HQ  [at  the  .002  level].  That  the  distributions  of  X  and 


Y  are  very  non-normal  can  be  quickly  examined  by  plotting  the  informative 
quantile  functions  of  the  samples.  For  a  sample  quantile  function  Q(u)  the 
informative  quantile  function  IQ(u),  which  represent  Q(u)  normalized  so  that 
Q( 0. 5 )  a  0  and  approximately  Q'{0.5)=1,  is  estimated  by 

iQ(u)  =  aLy).-..Q.(p,5) _ 

2{q(0.75)  -  q(0.25)J 

For  a  random  sample  Xj,...,Xm,  with  order  statistics  X^ ^<. . ,<X^ ,  we  define 
^m+T*  =  X(j)  *  j=! , . . .  ,m; 

Qfu)  is  defined  by  linear  interpolation  for  other  values  of  u.  With  this 
convention  we  obtain 


u 

1/9 

2/9 

3/9 

4/9 

5/9 

6/9 

7/9 

8/9 

IQx(u) 

-.38 

-.38 

-.14 

-.02 

.02 

.17 

.32 

.98 

u 

1/11 

2/11 

3/11 

4/11 

5/11 

6/11 

7/11 

8/11 

9/11  1 

IQy(u) 

-.52 

-.26 

-.13 

-.04 

.04 

.09 

.52 

.67 

These  informative  quantile  functions  indicate  shorter- tai led  distributions 

than  the  normal.  A  test  based  on  the  t-statistic  might  still  be  defended  by 

those  who  believe  that  robustness  justifies  such  procedures  [this  may  be  true 

only  for  distributions  for  which  IQ(u)  is  not  too  asymmetric], 

II.  Conventional  two-sample  procedure.  Apply  a  Wilcoxon  rank  sum  test. 

Let  R.  denote  the  ranks  in  the  pooled  sample  of  the  X  values. 

J 


R:  8,  9,  13,  14,  15,  16,  17,  18 


One  desires  to  test  the  significance  (concerning  equality  of  populations) 
of  the  rank  sum  equal  to  110,  or  equivalently  of  the  statistic 


,  m  R . 

t  _  1  v  J 

m  N+l 
J=1 

Note  E[T]  =  0.5.  For  the  Mark  Twain  data,  T  =  110/152  =  .7237.  The 
variance  of  T  is  .0055;  therefore  (T-E  [T])/a(T)  =  3.02.  One  concludes 
that  the  hypothesis  HQ  that  Twain  wrote  the  Snodgrass  papers  is  rejected 
[at  the  .001  level,  using  the  normal  approximation]. 

III.  A  graphical  test  can  lead  to  a  firm  conclusion.  An  alternative  to 
computing  a  statistic  and  determining  its  significance  level  is  to  plot 
D(u),  using  the  fact  that  it  is  a  distribution  function  with  jumps  of 
size  1/m  at  the  points  R./N.  D(u)  has  the  following  values: 

J 


u  8/18 

9/18 

13/18 

14/18 

15/18 

16/18 

17/18 

18/18 

.444 

.5 

.722 

.778 

.833 

.889 

.944 

1.0 

D  (u)  1/8 

2/8 

3/8 

4/8 

5/8 

6/8 

7/8 

8/8 

.125 

.25 

.375 

.5 

.625 

.75 

.875 

1.0 

The  graph  of  D(u)  is  always  below  the  uniform  Dq(u)  =  u;  we  conclude  that 
no  reasonable  test  procedure  would  decide  that  Twain  wrote  the  Snodgrass 
papers. 

IV.  Pseudo-correlations.  The  following  table  lists  for  the  Mark  Twain 

2 

data  the  squared-modulus  |p(v)|  of  the  pseudo-correlations  of  lags 
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.2527 


3 

.0652 


4 

.1196 


Since  2Nx/(l-x)  =  28.8,  the  pseudo-correlation  of  lag  1  indicates  that  Hq 
should  be  rejected  [at  the  .025  level;  28. 8 | p ( 1 ) ( 2  =  7.3], 

V.  Entropy  and  AIC.  The  following  table  lists  for  the  Mark  Twain  data  the 
entropy  -  log  Km,  and  order  determining  criterion  AIC(m),  for  m=l ,2, . . . ,5: 


m 

1 

2 

3 

4 

5 

-loa  K 

3  m 

.291 

.696 

1.669 

1.980 

2.740 

AIC(m) 

-.152 

-.418 

-1.252 

-1 .424 

-2.046 

One  rejects  Hq  because  AIC(m)<0  for  some  m>l  (and  indeed  AIC  is  negative 
for  all  the  values  of  m  listed  above).  No  optimal  order  m  is  chosen 
because  AIC(m)  does  not  achieve  a  relative  minimum  among  the  orders  listed. 
VI.  Graphs  of  autoregressive  density  estimators  dm(u).  When  an  order  m 
is  determined  one  considers  d~(u)  as  an  estimator  of  d(u).  For  the  Mark 
Twain  data,  where  the  two  samples  are  almost  disjoint,  no  order  is 
determined.  The  graphs  of  Dm(u)  also  indicate  that  a  satisfactory  estimator 


is  not  achieved  among  m=l,...,5.  Since  the  sample  sizes  are  so  small  here, 
one  hesitates  to  consider  larger  values  of  m. 


Actual  graphs  produced  by  TWOSAM  are  not  included  in  this  paper 
because  the  paper  is  too  long  and  for  the  Snodgrass  example  the  graphs  are 
not  actually  needed  to  draw  conclusions  about  Hq.  Of  course,  one  should 
study  the  graphs  in  order  to  discern  information  not  contained  in  the  numbers 
proposed  as  diagnostic  measures. 

Epilogue.  What  do  we  see  as  the  future  of  the  FUN.STAT  Quantile  approach  to 
two-sample  data  analysis?  It  aims  to  provide  statisticians  with  (1)  new 
procedures  which  can  detect  differences  in  populations  which  are  not  diagnosed 
by  conventional  procedures,  and  (2)  diagnostics  of  distributional  shape 
which  can  enhance  confidence  in  the  use  of  conventional  procedures.  The 
theory  of  the  new  procedures  is  asymptotic,  but  they  are  practical  to  use 
in  both  very  small  and  very  large  samples.  The  investigation  of  their 
properties,  especially  in  small  samples  by  Monte-Carlo  methods,  can  be 
considered  to  provide  many  important  research  problems.  We  would  like  to 
emphasize  our  belief  that  it  is  unwise  to  rely  on  pure  graphical  data 
analysis  based  only  on  graphs  which  are  not  accompanied  by  diagnostic 
measures.  FUN.STAT  facilitates  estimation  of  entropy  and  information  measures 
which  are  particularly  useful  summary  measures  because  they  may  provide 
comparisons  between  parametric  and  non- parametric  analysis  of  a  data  set 
[see  Parzen  (1983)]. 
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